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ON  THE  LOCAL  CONVERGENCE  OF  QUASI-NEWTON  METHODS 
FOR  CONSTRAINED  OPTIMIZATION* 

PAUL  T.  BOGGS',  JON  W.  TOLLEt  and  PYNG  WANGS 

Abitmt.  We  consider  the  application  of  a  general  class  of  quasi-Newton  methods  to  the  solution  of  the 
classical  equality  constrained  nonlinear  optimization  problem.  Specifically,  we  develop  necessary  and 
sufficient  conditions  for  the  Q-superlinear  convergence  of  such  methods  and  present  a  companion  linear 
convergence  theorem.  The  essential  conditions  relate  to  the  manner  in  which  the  Hessian  of  the  Lagrangian 
function  is  approximated. 


1.  Introduction.  In  this  paper  we  consider  means  of  solving  the  equality  con¬ 
strained  nonlinear  optimization  problem 


(NLP) 


Minimize  f(x) 
subject  to  g(x)  =  0, 


where  it  is  assumed  that  /:  R"  -*  R  and  g :  R"  -»  Rm  are  smooth  functions. 

After  two  decades  of  experimentation  with  penalty  function  techniques,  aug¬ 
mented  Lagrangian  functions,  gradient  projection  methods  and  other  procedures, 
research  on  numerical  methods  for  solving  NLP  has  recently  centered  on  implementing 
some  form  of  a  quasi-Newton  technique  for  this  constrained  problem.  The  preeminence 
of  quasi-Newton  methods  for  solving  unconstrained  nonlinear  problems  and  good 
experimental  results  to  date  lead  one  to  believe  that  this  approach  is  sound.  However, 
there  remain  numerous  questions  concerning  convergence,  rates  of  convergence, 
update  formulas,  and  implementation  that  are  as  yet  unanswered.  It  is  the  purpose 
of  this  paper  to  shed  light  on  some  of  these  questions,  in  particular,  on  the  local  and 
Q-superlinear  convergence  of  these  methods. 

We  define  a  quasi-Newton  method  for  NLP  as  an  iterative  scheme  which  generates 
sequences  {**},  {A*},  and  {B*}  from  formulas 


(1.1) 

(1.2) 

(1.3) 

(1.4) 


a‘*'  =  AU\a\B*), 
BkSk  =— /*(-**,  A**'), 


fc+1  fc  ,  k*k 
r  =  X  +a  6t, 


Bk.l=3Hxk,xk",Ak,Ak*\Bk), 


where  x°,  A0  and  B0  are  given,  A  and  38  are  appropriate  update  functions  and 
l(x.  A)  =/(x)  +  Arg(x)  is  the  standard  Lagrangian  function.  The  step  lengths  a*  are 
obviously  important,  but  for  local  convergence  theory  a*  =  1  is  the  optimal  choice 
and  ak  will  be  taken  to  have  this  value  throughout. 
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Much  of  the  recent  work  on  quasi-Newton  methods  for  NLP  can  be  put  into  this 
framework.  Powell  [8],  [9],  following  the  work  of  Han  [6],  obtains  Sk  by  solving  a 
quadratic  program 

Minimize  V/(x*)Ts!;  +2«U  rBxSkx 
subject  to  Vgu1)7^  =  -g(xk), 

and  chooses  A k  M  to  be  the  optimal  multiplier  vector  for  this  program.  Bk  is  a  standard 
rank  two  update  approximation  to  /,,  with  a  modification  which  assures  that  the  Bk 
remain  positive  definite. 

Tapia  [10],  [11]  shows  that  Powell’s  choices  of  Sx  and  A*"'  can  be  obtained  by 
applying  a  structured  quasi-Newton  method  to  the  system 


(1.5) 

That  is,  * 1 

(1.6) 


/,(x,  A )  =  0,  /A(x,  A)  =  r 

and  A k ' 1  are  obtained  from  the  equations 

r  Bk  v*u*nr«;i_r-/,u*,A‘)i 
LvK(xfc)r  0  JLsiJ  L  -e(xk)  J 


(1.7) 


A*m=A* 


+  8 


k 

As 


with  Jt fc  * 1  given  by  (1.3).  Here  again  Bk  is  an  approximation  to  /„,  so  that  the 
(n  +  m )  x  («  +  m )  matrix  in  ( 1 .6)  is  a  structured  approximation  to  the  Jacobian  matrix 
for  the  system  (1.5).  It  is  easily  seen  that  the  solutions  to  (1.6),  (1.7),  given  by 


(1.8)  Ak"=(Vg(xk>TBk,Vg(xk»  '{g(xk)~Vg(xk)TBk'Vf(xk)}, 

Si  «  -Bk'{I-Vg{xk)(Vg(xk)TBk'Vg(xk)) 'Vg(xk)TBk'mxk) 

U  -Bk'Vg(xknVg(xk)rBk'Vg[xk))  'g{xk), 

also  satisfy  (1.1)  and  (1.2).  In  addition  to  the  formula  (1.8),  Tapia  presents  a  number 
of  other  possible  updates  for  A,  preferring,  for  theoretical  reasons,  a  double  update 
of  A. 

In  [1]  the  authors  have  considered  a  variation  of  the  system  (1.5)  in  which  the 
Lagrangian  function  l(x.  A)  is  replaced  by  a  more  general  Lagrangian  M(x,  A)  which 
is  quadratic  in  A.  The  purpose  for  introducing  this  generalization  was  to  obtain  better 
convergence  from  poor  starting  points.  Locally,  however,  the  quasi-Newton  equations 
derived  from  M(x,  A)  are  nearly  identical  to  those  of  (1.6). 

The  local  convergence  of  these  methods  has  been  investigated  by  a  number  of 
authors.  Before  reviewing  their  results,  we  point  out  the  distinction  between  O- 
superlinear  and  R-superlinear  rates  of  convergence  and  the  difference  between  the 
convergence  rates  of  the  vector  {(x\  A  * )}  and  onent  [xk].  Recall  that  a  vector 

sequence  {t>*}  converges  R-superlinearly  to  v  nly  if  the  sequence  (In*1  - t>*|] 

is  bounded  by  a  sequence  which  converges  O-su  mearly  to  zero.  Because  an 
R-superlinearly  convergent  sequence  need  not  be  even  Q-linearly  convergent,  R- 
superlinear  convergence  by  itself  is  computationally  meaningless.  It  is  also  the  case 
that  the  O-superlinear  convergence  of  {ck}  implies  only  the  R-superlinear  convergence 
of  its  components.  (See  Tapia  [10,  section  8]  for  a  more  detailed  discussion.)  Since 
A*+l  depends  only  on  xk  and  not  on  A*,  to  be  most  effective  the  structured  quasi- 
Newton  method  should  yield  O-superlinear  convergence  of  the  sequence  {jtk}. 

The  major  convergence  analyses  center  on  how  well  and  in  what  sense  the  Bk 
generated  by  (1.4)  approximate  the  Hessian  of  the  Lagrangian  function  at  (x*,  A*), 
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the  optimal  solution  pair.  These  analyses  are  based  on  similar  studies  of  the  uncon¬ 
strained  problem.  In  the  latter  case  extensive  use  is  made  of  the  Broyden-Dennis-More 
analysis  of  the  quasi-Newton  update  formulas  and  the  Dennis-More  characterization 
of  Q-superlinear  convergence.  (See  Dennis  and  More  [4]  for  a  survey  of  these  results.) 
This  characterization  shows  that  Q-superlinear  convergence  in  the  unconstrained  case 
occurs  if  and  only  if 


(1.10) 


|(Bk-VV(x*))«k|  „ 

ii1!  ~°' 


where  V2/(x*)  is  the  Hessian  of  the  function  to  be  minimized,  Sk  is  the  step  generated, 
and  Bk  is  the  approximate  Hessian. 

For  the  constrained  case  Powell  [9]  develops  a  procedure  for  updating  the  Bk 
and  shows  that,  under  the  second  order  sufficiency  conditions,  the  resulting  method 
is  at  least  B-superlinearly  convergent  in  x.  He  also  provides  a  condition  related  to 
(1.10)  which  is  sufficient  for  “2-step"  superlinear  convergence.  In  particular,  for  the 
projection  matrix 

P(x)  =  I -Vg(  x)Wg(X)TVg(x))-'Vg(X)T, 


Powell  shows  that 
(1.11) 

is  sufficient  for 


|P(xk)(Bk-/„(x\A*))P(xk)«i|  „ 

jijj 


u 

k 


k+i 

fcTT 


-0. 


Powell  was  not  able  to  show  that  his  method  satisfies  this  condition,  however. 

Also  under  the  second  order  sufficiency  conditions,  Han  [6]  demonstrates  the 
Q-superlinear  convergence  of  {(x\  Ak)}  when  a  form  of  Greenstadt’s  update  is  used 
in  (1.4).  However,  Han  requires  the  stronger  assumption  that  i„(x*,  A*)  be  positive 
definite  in  order  to  obtain  the  Q-superlinear  convergence  in  (x,  A)  for  the  BFGS 
update.  It  should  be  noted  that  Greenstadt’s  method  is  not  computationally  attractive, 
since  it  almost  always  performs  poorly  in  spite  of  its  theoretical  properties.  To 
guarantee  that  /„(**,  A*)  is  positive  definite  requires  the  addition  of  a  penalty  term 
to  the  Lagrangian,  a  computationally  unattractive  option. 

Tapia  [10],  [11]  and  Glad  [5]  obtain  Q-superlinear  convergence  in  (x.  A)  for 
lxx(x*,\*)  positive  definite.  Tapia  [10]  obtains  the  stronger  result  of  Q-superlinear 
convergence  in  x  but  at  the  cost  of  an  additional  update  of  A  at  each  step. 

In  this  paper  we  first  characterize  Q-superlinear  convergence  in  x  for  these 
methods  (Theorem  3.1).  The  characterization  is  a  natural  generalization  of  the  Dennis- 
More  result  (1.10).  Simply  put,  it  states  that  Q-superlinear  convergence  in  x  occurs 
if  and  only  if 


(1.12) 


|P(xk)(Bk-/„(x*,A*))5,k|  „ 

ra 


where  P(x)  is  the  projection  matrix  given  above.  Note  that  (1.12)  does  not  contain  a 
post-multiplication  of  Bk  -lxx(x*,  A*)  by  P(xk)  as  does  (1.11),  and  hence,  it  takes  into 
account  the  action  of  Bk-I,x(x*,  A*)  off  of  the  null  space  of  Vg(xk)T,  which  (1.11) 
does  not. 
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Using  the  characterization  ( 1 . 1 1 ),  we  then  show  (Theorem  3.2  >  that  O-supertinear 
convergence  in  x  is  obtained  when  /„(x*,A*)  is  positive  definite.  This  is  a  slightly 
stronger  result  than  those  previously  published  and  reviewed  above.  Finally,  a  sufficient 
condition  for  O-linear  convergence  in  x  is  established  (Theorem  3.3).  This  theorem 
also  makes  use  of  the  matrices  P(x‘  )(B*  -  l„{x*,  A  *)),  requiring  that  they  be  small  in 
norm  for  all  k.  Hence  it  provides  a  complementary  result  to  Theorem  3.1. 

2.  Basic  notation  and  assumptions.  For  the  problem  NLP  considered  here  we 
assume /and  g  are  at  least  three  times  continuously  differentiable  and  that  the  gradient 
Vg(x)  has  full  rank  for  all  v.  In  addition  we  assume  that  NLP  has  a  (local)  solution 
x*  at  which  the  second  order  sufficiency  conditions  hold.  That  is,  there  exists  a  unique 
vector  A  *  e  Rw  such  that 

(i)  l, (x*,A*)  =  0, 

(ii)  Vg(x*)ry  =0,  y  #0  implies  yr/„(x*,  A*)y  >0. 

For  functions  h:Un  -*  R",  we  denote  the  Jacobian  and  Hessian  matrices  by  Vh(x) 
and  V2h(x),  respectively.  Here,  for  notational  convenience,  Vh (x)  is  always  written 
as  an  n  x  q  matrix.  For  functions  of  x  and  A,  we  denote  derivatives  with  respect  to  x 
or  A  by  subscripts;  hence,  /.(x,  A)  =  V/(x  1  +  Vg(x)A,  /,A(x,  A  )  =  Vg(x),  etc. 

Vectors  are  always  column  vectors  unless  transposed,  the  transposition  operation 
for  vectors  and  matrices  being  indicated  by  a  superscript  T. 

In  the  theory  of  constrained  minimization,  the  projection  of  vectors  onto  the 
tangent  space  of  the  level  sets  of  the  constraints  plays  an  important  role.  For  a  given 
x,  the  matrix 

P[x)~[l  -Vg(x)(Vg(£)rVg(x))  'Vg(x)T] 
projects  vectors  onto  the  tangent  space  of  the  smooth  manifold 

{*:  g(x)  =  g(x)} 

at  x  =  x.  The  projection  onto  the  orthogonal  complement  of  this  tangent  space  will 
be  denoted  by  Q(x).  Thus 

Q(x)  =  /-P(x). 

|*|  will  everywhere  denote  the  /2-norm.  In  S3,  it  is  necessary  to  use  the  Frobenius 
norm  for  matrices.  The  Frobenius  norm  weighted  by  the  matrix  M  is  denoted  by  ||-  ||M- 

3.  Necessary  and  sufficient  conditions  for  superlinear  convergence.  We  consider 
the  algorithm  obtained  by  applying  a  structured  quasi-Newton  method  to  the  system 
(1.5),  thus  obtaining  the  formulas  given  in  (1.6)  and  (1.7)  with  solutions  (1.8)  and 
(1.9).  This  algorithm  has  the  important  property  that  £,  satisfies  the  linearized 
constraints,  i.e., 

g(xit)  +  Vg(xt)T5i=0. 

Extending  the  analysis  by  Powell  [9],  we  obtain  a  necessary  and  sufficient  condition 
for  Q-superlinear  convergence  in  x,  given  linear  convergence  and  a  few  basic  assump¬ 
tions  on  the  approximating  matrices  £*•  The  essential  condition  is  that  the  matrix  Bk 
must  approximate  the  Hessian  matrix  /„(x*,  A*)  in  the  sense  of  Dennis  and  More  [3] 
but  only  when  projected  onto  the  tangent  hyperplane  to  the  surface  {r :  g(r)  =  gU*)}. 

We  assume  in  the  remainder  of  this  section  that  the  Bk  are  symmetric,  nonsingutar, 
and  uniformly  bounded.  In  addition,  we  assume  that  the  matrices  Bk  are  uniformly 
positive  definite  on  the  null  space  of  Vg(x*)T.  That  is,  there  exists  a  £  >0  such  that 
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whenever  y  *  0  and  Vg(x*)Ty  =0, 

yTBkym/3\y\7 

for  every  k.  Thus  we  require  the  Bk  to  satisfy  the  second  order  sufficiency  condition 
satisfied  by  /,*(**,  A*)  (see  §  2).  This  assumption  is  slightly  weaker  than  that  of  Powell 
who  assumes  that  the  Bk  are  positive  definite  on  R"  and  uniformly  positive  definite 
on  the  null  space  of  Vg(x*)T. 

Note  that  with  the  above  assumptions  on  the  matrices  Bk,  the  matrices 
VgU*)rBn,‘ Vg(x*) are  nonsingular.  For  if  V  g(x*)TBk'  V  g{x*)z  =  0,  then  BklVg(x*)z 
is  in  the  null  space  of  Vg(x*)T,  and  hence,  z*0  implies  the  contradiction  0< 
(Bk'Vg(x*)z)TBk(Bk'Vg(x*)z)  =  zTVg(x*)TB-k'Vg(x*)z=0.  It  follows  that  (1.8) 
and  (1.9)  are  well-defined  for  xk  sufficiently  near  to  x*. 

Our  first  result  is  a  generalization  of  a  result  of  Powell  [9], 

Lemma  1.  The  value  ofSk  is  invariant  under  the  transformation 

Bk  -* Bk  +Vg(x k)UT  =2?t, 

where  U  is  any  nxm  matrix  such  that  both  Bk  and  Vg(xk)TBk'Vg(xk)  are  nonsingular. 
Proof.  It  follows  from  (1.9)  that  the  lemma  is  true  if  the  matrices 

A\  =  BtIVg(xk)[Vg(xk)TBk1  Vg(xk)]_l 

A2  =  Bk' -B-k'Vg(xk)[Vg(xk)TBktVg(xk)]'Vg(xk)TBk' 

are  independent  of  U.  The  assumptions  on  U  allow  the  use  of  the  Sherman-Morrison- 
Woodbury  formula  (see,  e.g.,  Ortega  and  Rheinboldt  [7,  p.  50])  to  express  Bk  1  as 

Bkl  =  B*‘  -  Bk'Vg(xk)[I  -  UTBk'Vg(xk)Y'UTBk' . 

Substitution  of  this  expression  into  A  i  yields 

A1=fllWg(xk)[Vg(xk)rf?*,Vg(xk)]  ', 
which  establishes  the  result  for  At.  For  A2,  note  that 

A2  =  [I-AkV  g(xk)]Bi'. 

Again  using  the  expression  for  Bk  yields  the  desired  result. 

It  follows  from  the  assumptions  made  on  the  Bk  that  if  U  =  yVg(xk),  where  y  is 
a  sufficiently  large  positive  constant,  then  the  hypotheses  of  the  lemma  are  satisfied. 
It  should  also  be  noted  that  the  value  of  Ak  +  1  is  not  invariant  under  the  given 
transformation  in  Bk.  Thus,  a  variety  of  choices  of  Ak  +  I  give  rise  to  the  same  value 
of  Sk  (as  demonstrated  by  Tapia  in  [1 1]).  However,  it  is  easily  seen  that  the  first  order 
necessary  conditions  and  equation  (1.8)  imply  that  if  {Ark}-»x*  then  {Ak*'}-»  A*. 

For  convenience,  we  now  write  ( 1 .9)  in  the  form 

(3.D  ~BkSk  =  VkVf(xk)+Wkg(xk). 

We  note  that  the  two  vectors  on  the  right-hand  side  are  conjugate  with  respect  to 
Bk' ;  in  fact,  VfBklWk  =0.  Letting  Pk  be  the  projection  matrix  at  x*  defined  in  §2, 
we  see  that 


PkVk  =  Pk, 


(3.2a) 

(3.2b) 

(3.2c) 


VkPk  =  V*, 
Pkwk  =  0. 
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The  next  lemma  is  also  a  modification  of  the  results  of  Powell.  Two  positive 
sequences,  {$*}  and  {r*},  which  converge  to  zero  are  said  to  be  of  the  same  order  if 
there  exist  positive  constants  c,  and  c2  such  that  for  k  sufficiently  large, 

l-M 


f,si7rC2- 


Lemma2.  Suppose  {xk}-*{x*}  with  a  linear  rate  of  convergence.  Then  the  sequences 
{lx4  -x*|},  and  {|g(x*)|  +  |PtV/(xk)|}  converge  to  zero  and  are  of  the  same  order. 

Proof.  Using  Lemma  1  and  the  properties  of  the  Bk,  we  see  that  by  choosing  y 
sufficiently  large  we  may  replace  the  Bk  by  Bk  for  which  Bk  and  Bk  are  uniformly 
bounded  and  positive  definite.  The  change  does  not  affect  the  value  of  Skx  or  the 
relations  (3.2).  Now  using  (3.2b)  and  (3.1)  there  exists  an  at  >0  such  that 

\8k\Sal{ig(xk)\  +  \PkVf(xk)\). 


(3.1),  (3.2a),  (3.2c),  and  the  linearized  constraint  equation  yield  the  existence  of  an 
a2  >  0  such  that 


{|g(x  *  )|  +  \PkVf{x  k  )j}  S  a2|6j  |. 

Thus,  {(6^  (}  and  {(g(xA  )j  +  (PiV/Xx*)!}  are  of  equivalent  order.  That  {{5* |}  and  {|xl  -x*|} 
are  of  the  same  order  follows  from  the  consequence  of  linear  convergence 


l-r&H 


|x*-x’ 


-,n\+r. 


where  r  <  1 .  This  completes  the  proof. 

Now  let  {Gk}  be  any  sequence  of  matrices  satisfying 

GkSk  =  lx(xk*x,  \k*')-lx(xk,\k*'). 


(i) 

(ii)  Gk  -» lxx(x*.  A*). 
For  example,  Gk  could  be  chosen  as 


-r 


Uxk+tsk,\k+')dt. 


Lemma  3.  Assume  {x  }-»x*  linearly.  Then  there  exists  an  a  >0  such  that 
|x‘+1-x*|S«[|5i|2  +  |P*(Gk-B*)5i|]. 

Proof.  By  Lemma  2  there  exists  an  r\  >  0  such  that 


(3.3) 
Now 

(3.4) 


-X*|Sr?{|g(x‘i,)|  +  |Pk  +  IV/(xfc  +  ’)|}. 


g(x*M)  =  g(x'c)  +  Vg(x‘)r5l  +  0(i5ir)  =  0(|5i|2). 
From  (i)  above  and  ( 1 .3), 


(Gk-Bk)Sk=lAxkt\ A**1). 


Using  the  fact  that  Pk4.tVg(xkU)  =  0,  we  obtain  the  identity 


(3.5) 


(Pk.,-Pk )(Gk  -Bk)8k+Pk(Gk-Bk)Sk=PkxiVf(xk" ). 
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The  smoothness  assumptions  on  gU)  assure  that 

(3.6)  J\*,-P»=0(|*i|) 


and  the  lemma  follows  from  (3.3)-(3.6)  and  the  uniform  boundedness  of  the  Gk  and 
the  Bk. 

We  can  now  state  and  prove  the  necessary  and  sufficient  conditions  for  Q- 
superlinear  convergence  for  the  structured  quasi-Newton  method  as  applied  to  the 
system  (1.5). 

Theorem3. 1.  Let  {Sk,  5  *)  satisfy  (1.6)  where  the  matrices  Bk  satisfy  the  conditions 
stated  at  the  beginning  of  the  section.  Suppose  {jt fc }  -*  jr *  linearly.  Then  k }  -» Jt * 
O-superlinearly  if  and  only  if 


(3.7) 


lim 

k  -*ae 


\Pk(Bk-lxx(x*,\*))Sk\ 

i*ii 


=  0, 


where  Pk  =  l  - Vg(xk)(Vg(xk)rVg(xk))~'Vg(xk)T. 

Proof.  Let  {Gt}  be  a  sequence  of  approximations  to  /„(**,  A*)  as  defined  above. 
Clearly  Gk  can  replace  /„U*,  A*)  in  (3.7).  Now  suppose  (3.7)  holds.  Then  by  Lemma 
3 


But  by  Lemma  2  {|st|}  and  {\xk  -jc*|}  are  of  the  same  order;  hence  there  is  a  constant 
a  >  0  such  that 


-S  a- 


i*:i 


“  a 


o(|gj|) 

'  W\  ’ 


which  demonstrates  Q-superlinear  convergence. 

For  the  converse,  suppose  s{xk}-»  x*  Q-superlinearly.  Using  Lemma  2  and  (3.5), 
we  have  that,  for  some  tj  >0, 

I Pk(Bk  -Gt)«;+(PkM-#V)(B*-G»)«!l |  +  l«(x‘+,)|s»J|*‘*,-x*|, 


which,  together  with  (3.4)  and  (3.6),  imply  that 


\Pk(Bk-Gk)8K,\ 


I  S' 


=  *7  " 


i*;i 


+o(|«;d. 


Again  using  Lemma  2,  we  have 


\Pk(Bk-Gk)Skx\ 


=  v 


7^1 


+  0(|5k|)  =  fj 


Uk  +  l 


4od«i|). 


Letting  k  -»oo  (and  hence  |£kj-»0)  gives  the  desired  results. 

We  note  that  if  f(x)  is  augmented  by  the  penalty  term  cg(x)TgU)  with  c  a  large 
positive  constant,  then  the  second  order  sufficiency  conditions  imply  that  the  Hessian 
of  the  augmented  Lagrangian  is  positive  definite  at  (x*,A*).  Moreover,  it  is  easily 
shown  that  the  formula  (1.9)  is  unchanged  by  this  added  term;  thus  the  only  effect  is 
in  the  update  formula  (1.4).  If,  as  is  common,  the  Bk  are  chosen  to  approximate 
txx(x*,  A*)  in  the  sense  that 

(3.8)  Bk+i8k,=yk  =  lAxk*',\k+')-lAxk,Ak+t), 


then  the  assumption  that  lxx(x*,  A*)  is  positive  definite  makes  the  update  formulas 
which  preserve  positive  definiteness,  such  as  the  DFP  or  BFGS,  natural  candidates 
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for  use  in  this  scheme.  The  next  theorem  shows  that  <?-superlinear  convergence  is 
achieved  in  these  cases  (cf.  Han  [6],  Tapia  [11],  and  Glad  [5]).  The  following  lemma 
is  important  for  our  proof. 

Lemma  4.  Let  Bk ,  i  be  derived  from  Bt  by  either  the  DFP  or  the  BFGS  update 
with  yk  given  by  (3.8).  Assume  that  lKX{x*,  A*)  is  positive  definite  and  {xk)  converges 
linearly  to  x*.  Let  B*t  i  be  generated  from  Bk  using  the  same  update  formula  as  for 
Bk  ,i  but  with  yk  replaced  by  y*.  where 

Then 

(3.9)  |flk.i-B*.i|So«rU***,,A“1Ux*tA*)), 

where  a  is  a  constant  independent  of  k  and 

o,((xk*>,  A k  *'),  ( xk,  A k))  =  max  (l(x',  A') -f.tr*.  A*) j:  /  =  k,  k  +  1}. 


Proof.  We  prove  the  lemma  for  the  BFGS  update.  The  proof  for  the  DFP  update 
is  similar  but  more  laborious.  From  the  definitions  of  yk  and  \  *  we  have 

yk  -y*  =  (Vg(jr**,)-VgU*B(Ak*,-A*), 

and  thus,  there  is  a  constant  0i  such  that 

|y*-y*|SP,|«;i|A“'-A*|. 

From  the  assumptions  there  exist  positive  constants  rjt,  tj2  such  that  for  large  k 

(yk)TSk^nMk\\  |y‘|sn2|*;i. 

[y*)TSkmVi\SkW  |y*|Srj2|5i|. 


For  the  BFGS  update, 


&k  +  i  -  B** i  - 


from  which  it  follows  that 


((y*)rS*)y‘‘(yfc)r-((y*)T<S*)y*(y*)r 

UyV^HlyS1^) 


\Bk*i  Bk*i |=  =3MWt?.)|A  A  |. 

Inequality  (3.9)  follows  immediately. 

Theorem  3.2.  Assume  lxx (x*.  A*)  is  positive  definite.  If  the  Bk  are  obtained  by 
either  the  DFP  or  BFGS  formulas  with  yk  defined  by  (3.8)  and  if  {**}  converges  to  x* 
linearly,  then  the  convergence  is  Q-superlinear. 

Proof  The  proof  follows  the  lines  of  argument  used  in  unconstrained  optimization. 
Let  y*  and  B**t  be  as  defined  in  Lemma  4.  From  our  assumptions  x*  is  an  uncon¬ 
strained  minimum  of  l(x,  A*)  and  hence  the  results  of  Broyden,  Dennis  and  More  [2] 
for  the  unconstrained  case  can  be  applied  to  obtain  the  fundamental  inequality: 

IIB** .  -  /„  (x  *,  A  *)||M  §  {( 1  -  cel )' 1 n  +  a !  <f  (x k  * ' ,  * k  )}HB*  -  /„  (X  *,  A  *)||M 

+  at2<x(xk*',  xk). 


(3.10) 
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where  on  and  a2  are  constants  independent  of  k, 

xk)  =  max{|xk>1  -x*|,  |xk  -x*|], 

lAf(B,-/„(x*,A*))gkl 
k  l\Bk-l,Ax*,\*)U\M'Skx\' 

M  =  /„(x*,  A*)  l/2, 

and  the  A/-norm,  |!0||m.  stands  for  the  Frobenius  norm  of  the  matrix  MQM.  The 
triangle  inequality  can  now  be  used  with  (3.9)  and  (3.101  to  establish 

II Bk . ,  -  /„(x  *,  A  * )||M  =M(  1  -  )' /2  +  a3<r(xk  + 1 ,  x  k  »||B*  -  lxx  (x * ,  A*)||M 

.  *,.(  .HI,  ,1c 

+  a4o-((x  ,  A  ._),(x  ,  A  )), 

where  Bk  M  is  the  DFP  and  BFGS  update  of  By  and  a  is  defined  as  in  Lemma  4. 
Since  {xk}  converges  to  x*  (and  hence  {A*}  converges  to  A*)  it  follows  that  {\\Bk  - 
Ux*.A*)|L,}  has  a  limit  (Dennis  and  More  [3]).  If  the  limit  is  not  zero,  then  (3.11) 
implies  6k  -*  0;  if  the  limit  is  zero  then  ||Sk  -  lxx{x* ,  A*)||M  -*  0.  In  either  case  we  have 

\(Bk-lxAx*,k*))8i\  n 


Since  the  projection  matrices  P(xk)  are  bounded,  Theorem  3.1  can  be  applied  to 
establish  the  O-ruperlinear  convergence. 

Theorem  3.2  rests  heavily  on  the  assumption  that  f,,(x*,  A*)  is  positive  definite. 
In  theory,  f„(x*.  A*)  need  only  be  positive  definite  on  the  null  space  of  Vg(x*)r. 
Nevertheless,  most  implementations  of  the  quasi-Newton  approach  use  updates  (such 
as  BFGS)  which  maintain  positive  definiteness  of  the  Bk  (with  some  ad  hoc  scheme 
to  assure  that  (y‘)rs![  is  positive).  It  remains  an  open  question  as  to  whether  Q- 
superlinear  convergence  can  be  guaranteed  with  these  approaches. 

In  the  previous  theorems,  linear  convergence  of  the  {x‘}  is  assumed.  However, 
if  the  bounded  deterioration  inequality  (3.11)  holds,  then  linear  convergence  can  be 
achieved  by  requiring  |x°  -x*|  and  |B0-  I«(x*,  A*)|  to  be  sufficiently  small.  As  shown 
above,  (3.1 1)  holds  when  lxx{x*.  A*)  is  positive  definite.  Without  the  positive  definite 
assumptions  the  usual  conditions  for  linear  convergence  require  that  (8*  -  lxx(x*,  A*)| 
be  small  for  all  k.  (See  Han  [6]  and  Tapia  [10]  for  the  relevant  results.)  In  the  next 
theorem  we  relax  this  restriction  by  showing  linear  convergence  under  the  requirement 
that  |P(xkKBk-/xt(x*,  A  **))(  be  small  for  all  k.  This  theorem  further  illustrates  the 
significance  of  the  projection  operator  in  the  quasi-Newton  theory  for  constrained 
minimization. 

Theorem  3.3.  Let  the  Bk  satisfy 

Iflt'ISrj 

for  some  rj  >0.  Then  there  exist  positive  constants  e  and  £  such  that  if 

(i)  l*°-*1<6 

(ii)  |P(x*)(Bt-Ux*,A*))|<e  for  all  k  30, 
then  the  sequence  {x*}  generated  by 

(3.12)  xkM  =xk-Bi'Uxk,  A*(xk)), 
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where 

(3.13)  AlUI  =  (V(!U)Tflk,Vg(jr)l  \gi,x)-Vg(x)TBk'Vflx)), 

is  well  defined  and  converges  linearly  to  a*. 

Remark.  The  iteration  (3.12 )— ( 3. 1 3 >  is  equivalent  to  ( 1 .6 )— ( 1.7),  but  this  form 
makes  the  proof  easier. 

Proof.  As  demonstrated  earlier,  it  follows  from  the  assumptions  that  for  some 
£>Q  and  jA-A*i<£,  (VgU^Bj/VgUH  1  exists  and  is  uniformly  bounded.  Thus, 
for  |x°- x*\<£,  a1  is  well  defined.  Since  Ak(x*)  =  A*  for  all  k,  we  have 

a  1  -  a  *  =  a0  -  a  *  -  B» '/,  (a",  A„(a°» 

=  BoMflo-/„<^A*>-Vg(A*)VAolx*)T}(A°-A*)  +  /tV'), 

where  VA0(x*)  denotes  the  Jacobian  of  A«  at  a=a*  and  |/i°(A0)|SaHjA°~**i3>  a" 
constant.  From  (3.13)  we  see  that 

VA0(a*)t  =  (Vg(A*)rB„ 1  Vg(A*))  'Vg<A*)TB0 '  (Bo- /„(a*,  A*)). 

Therefore, 

|a  1  -  A*!  S  \B(,  ‘HU  -  Vg(x*HVg(x*)TBo'Vg(x*)y'Vg(x*)TBo'} 
■lBf,-lxAx*,k*))\-\x°-x*\  +  a°\xa-x*\2. 

Let  V*  =  I -Vg(x*)(Vg(x*)TBklVg(x*))  'Vg(x*)TBk'  and  note  that  as  in  (3.2b), 
V*P(x*)  —  V*.  Thus 

\x'-x*\^\B0'\-\V$\-\P(x*)(Ba-lxAx*,\*))\-\x0-x*\  +  a°\xn-x*\2. 

From  our  assumptions,  it  now  follows  that  the  |  V*  |  will  be  uniformly  bounded  by, 
say,  /3>0.  We  now  choose  e  and  £  small  enough  so  that  tjBe  +  «°£ Sp  <  1,  and 
therefore,  |xl - x*| Sp|x°- a*|.  The  desired  result  can  now  be  proven  by  induction 
since  the  sequence  {ak}  can  be  uniformly  bounded. 

We  observe  that  in  the  above  theorem,  condition  (ii)  could  be  replaced  by 

|P(x‘)(B* -/„(**,  A *))|<e, 

which  is  consistent  with  the  form  in  Theorem  3.1. 
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